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Summary 


To achieve the greatest output from their limited ge- 
nomes, viruses frequently make use of alternative 
open reading frames, in which translation is initiated 
from a start codon within an existing gene and, being 
out of frame, gives rise to a distinct protein product. 
These alternative protein products are, as yet, poorly 
characterized structurally. Here we report the crystal 
structure of ORF-9b, an alternative open reading frame 
within the nucleocapsid (N) gene from the SARS coro- 
navirus. The protein has a novel fold, a dimeric tent- 
like B structure with an amphipathic surface, and acen- 
tral hydrophobic cavity that binds lipid molecules. 
This cavity is likely to be involved in membrane attach- 
ment and, in mammalian cells, ORF-9b associates with 
intracellular vesicles, consistent with a role in the as- 
sembly of the virion. Analysis of ORF-9b and other 
overlapping genes suggests that they provide snap- 
shots of the early evolution of novel protein folds. 


Introduction 


To achieve expression of viral proteins during infection, 
viruses exploit many aspects of the host biology, such 
as transcription (McKnight and Tjian, 1986), RNA export 
(Harris and Hope, 2000), capping and methylation (Cou- 
got et al., 2004), as well as translation (Sonenberg and 
Dever, 2003). Viruses also use specific strategies, such 
as RNA editing (Turelli and Trono, 2005) and splicing 
(Pongoski et al., 2002), to enrich the diversity of viral pro- 
teins produced from the usually rather small viral ge- 
nome. A particularly bizarre phenomenon, especially in 
RNA viruses, is the use of multiple start codons within 
a gene which gives rise to different protein products 
(Samuel, 1989). The presence of these so-called alterna- 
tive open reading frames (ORFs) and their translation 
has been shown to occur by several mechanisms (re- 
viewed in Jackson, 1996): leaky scanning, where the 
usual start codon of a gene is in an unfavorable se- 
quence context, so that a fraction of ribosomal scanning 
complexes fail to initiate and continue scanning to the 
next start codon; internal ribosome entry sites; ribo- 


*Correspondence: dave@strubi.ox.ac.uk 
3Lab address: http://www.strubi.ox.ac.uk 


somal shunting (where, during scanning, ribosomal 
complexes translocate to a remote position on the 
mRNA and initiate translation there); and translational 
reinitiation after termination. 

Alternative ORFs and their mechanisms of transla- 
tional initiation have been described in many viral sys- 
tems, including influenza B (Shaw et al., 1983), Sendai 
virus (Giorgi et al., 1983), and reovirus (Ernst and Shat- 
kin, 1985; Jacobs and Samuel, 1985), as well as group 
2 coronaviruses, such as mouse hepatitis virus (MHV) 
and bovine coronavirus (BCV) (Fischer et al., 1997; Sen- 
anayake and Brian, 1997). So far, most of the research 
on alternative ORFs has focused on the mechanisms 
for initiation of translation, while the molecular struc- 
tures and functions of the corresponding proteins have 
received little attention. This is somewhat surprising, 
given that the special nature of such proteins (each aris- 
ing from a “gene inside another gene”) is likely to give 
new insights into protein structure and evolution. 

Recently, a coronavirus was identified as the causa- 
tive agent of an emerging disease, severe acute respira- 
tory syndrome (SARS) (Ksiazek et al., 2003). During its 
first outbreak in 2003, SARS led to at least 8000 infec- 
tions and over 750 fatalities (Donnelly et al., 2003). The 
SARS coronavirus (SARS-CoV) genome consists of 
approximately 29,700 nucleotides encoding a predicted 
28 proteins (Marra et al., 2003; Rota et al., 2003), includ- 
ing several alternative ORFs. These occur in the 3’ region 
of the genome, which encodes structural and accessory 
proteins that are involved in the assembly of the virus 
particle (see Figure 1). Although the precise details of 
coronavirus assembly are not well understood, the con- 
sensus model is that the RNA genome is packaged in the 
cell cytoplasm by multiple copies of the relatively well 
conserved (Stadler et al., 2003) N-protein to form the nu- 
cleocapsid. The nucleocapsid then associates with the 
viral membrane proteins M, S, and E in the ER-to-Golgi 
intermediate compartment, where virus particles as- 
semble and bud into the Golgi. In SARS-CoV, as in other 
group 2 coronaviruses, an additional protein is synthe- 
sized (called ORF-9b in SARS-CoV and internal or I-pro- 
tein otherwise) from an alternative reading frame of the 
N-gene. In BCV, the I-protein is produced in an exact 
molar ratio with the N-protein (Senanayake and Brian, 
1997) and in MHV it is present in the assembled virion, 
suggesting that it acts as an accessory structural pro- 
tein in viral assembly (Fischer et al., 1997). The MHV 
l-protein is not essential for the production of viable 
virus; however, it does confer a selective advantage 
for virus growth (Fischer et al., 1997). Although little is 
known about the equivalent protein in SARS Co-V, anti- 
bodies against ORF-9b have been found in patients, 
demonstrating that it is produced during infection (Qiu 
et al., 2005). 

We are pursuing an investigation into the structures of 
proteins from SARS-CoV (Sutton et al., 2004). Here we 
report the crystal structure of ORF-9b, which we demon- 
strate functions as an unusual lipid binding protein and 
possesses a novel fold. In addition, we show that 
ORF-9b expressed in mammalian cells colocalizes 
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with cellular vesicles, indicating a role in virus assembly 
via membrane association. To our knowledge, the struc- 
ture represents the first example of a viral protein trans- 
lated as an alternative open reading frame elucidated at 
high resolution, and we suggest that its unusual features 
may reflect its recent evolutionary origin. 


Results and Discussion 


Structure Determination 

The structure of ORF-9b was solved by single-wave- 
length anomalous dispersion (SAD) analysis of a seleno- 
methionated form of the protein expressed in Escheri- 
chia coli (see Experimental Procedures and Table 1 for 
details). The crystals diffracted X-rays weakly (overall 
B factor 84 A?), limiting the analysis to Bragg spacings 
of 2.8A (R = 26.6%, Riree = 28.9%), and a significant por- 
tion of the structure is not well ordered. Nevertheless, 
due to the presence of 8-fold noncrystallographic sym- 
metry, we can be confident that the interpretation of 
the structure is reliable. 


The Protein Fold 
The crystal structure reveals ORF-9b to be a 2-fold sym- 
metric dimer constructed from two adjacent twisted 
6B sheets (Figure 2A). Each of these sheets is formed 
from £ strands contributed by both monomers which 
form a highly interlocked architecture reminiscent of 
a handshake (Figure 2B). These extensive interactions 
bury over 1500 A? of surface area of each subunit on for- 
mation of the dimer (calculated with NACCESS; Hub- 
bard et al., 1991), typical of the contact area of protein 
dimers (Lesk, 2004), strongly suggesting a biologically 
relevant interaction. The hydrodynamic properties of 
the protein in solution (determined by dynamic light 
scattering and analytical ultracentrifugation; see the 
Supplemental Data available with this article online) 
are consistent with it being a dimer. 

The interdigitated nature of the ORF-9b dimer rests on 
a highly unusual topology of largely antiparallel 8 sheets 
in which monomers wrap around each other (Figure 2B). 
A search of the protein database with both the monomer 
and dimer using Dali (Holm and Sander, 1993) and SSM 
(Krissinel and Henrick, 2004) gave no significant struc- 
tural hits. This suggests that the structure represents 
a novel fold. 

Analysis of the surface electrostatics of ORF-9b 
shows a highly polarized distribution, in which one side 
of the molecule is predominantly negatively charged 
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Figure 1. The Structure of the SARS-CoV 
Genome 

Alternative open reading frames (ORF-3b, 
-7b, -8b, and -9b) are highlighted in gray. 
The nucleocapsid (N) gene including its inter- 
nal alternative open reading frame, ORF-9b, 
is shown in an enlarged representation. §S, 
spike protein; 3, ORF-3; E, envelope protein; 
M, membrane protein; 6, ORF-6; 7, ORF-7; 
8, ORF-8; N, nucleocapsid protein (adapted 
from Snijder et al., 2003; genome representa- 
tion is not to scale). 


while the other is positively charged (Figure 2C). It is 
interesting to note that in the electron density map, two 
regions near the N terminus (residues 1-8 and 26-37) 
are not resolved. Both of these disordered segments 
are located on the periphery of the structure, whereas 
the inner core of the protein is reasonably well ordered. 

The crystal of the protein is built up from twisted 
chains of ORF-9b dimers that form an open three- 
dimensional mesh, via end-to-end packing of dimers 
via the 6485 and £687 loops (Figure 2D), to occlude 
some 700 A? of surface area from each dimer. 


Table 1. Crystallographic Data Recording, Model Building, and 
Refinement Statistics 


Data Collection 


Experiment Single-wavelength 
anomalous dispersion 

X-ray source ESRF BM14 
Wavelength 0.97903 
Resolution (A) 19.9-2.8 (2.9-2.8) 
Unique reflections 22,040 
Redundancy 14.9 (15.1) 
Completeness 99.8 (100) 
Rmerge’ (%) 11.1 (°) 
I/a(I) 22.8 (1.3) 
Space group P4, 
Unit cell parameters (A) a=b=140.1,c = 45.2 
No. of molecules (AU) 8 
Solvent content (%) 52 
Refinement 
R value (%) 26.6 (40.4) 
Free R value, random 5% (%) 28.9 (43.2) 
Rmsd bond length (A) 0.009 
Rmsd bond angle (°) 1.8 
Rmsd B, bonded atoms (A?) 7 
Rmsd Ca, core® (A) 0.25 
Rmsd Ca, overall® (A) 4.6 
Ramachandran plot 

Allowed regions 97.3% 

Generously allowed regions 1.5% 

Disallowed regions 1.3% 


Values in parentheses refer to the highest resolution shells. 

* Rimerse = Bali li (h) = <I(h)>| / papa <I; (h) >, where IF (h) is the ith 
measurement and </(h)> is the weighted mean of all measurements 
of iF (h). 

©The R factor for the highest resolution shell exceeds 100%, as 
expected from the extreme weakness of the data (B factor from 
the Wilson plot, 84 A?): however, the very high redundancy (15) 
allows some useful data to be obtained in this shell. 

“ Core regions exclude the flexible ~ 4 N-terminal residues and flex- 
ible loops: 24-41 (residues 26-37 are not modeled), 48-52, 64-71, 
89-91. 
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Figure 2. The Crystal Structure of the 
SARS-CoV ORF-9b Protein 


(A) Cartoon representation. Two orthogonal 
views are presented. In the electron density 
map, residues 9-25 and 38-98 are resolved. 
The secondary structural elements are la- 
beled for one subunit in the upper image, £1 
13-16, B2 19-24, B3 42-50, B4 53-61, B5 69- 
75, B6 79-82, and 67 90-98. The lipid ligand 
is shown as yellow van der Waals radii 
spheres. The 2-fold symmetry axis is indi- 
cated. 

(B) Topology diagram illustrating the inter- 
locked architecture of the dimeric structure. 
Monomers are colored gray and cyan. Disor- 
dered regions are represented as dashed 
lines. 

(C) Surface electrostatic representation. In 
this view, the top of the structure is predom- 
inantly negatively charged (red), and the bot- 
tom is mainly positively charged (blue). The 
arrow indicates the opening of the hydropho- 
bic tunnel. 

(D) Crystal contacts between ORF-9b mole- 
cules. Neighboring dimers are rotated ~ 90° 
around their longest axis, and pack end to 
end via their 8485 and 6687 loops. This gives 
rise to an assembly reminiscent of a twisted 
rope. 

(E) Sequence alignment of SARS coronavirus 
ORF-9b and its homologs in mouse hepatitis 
virus (MHV) and bovine coronavirus (BCV). 


MESSRRPLGLTKPSVDQIIKIEAEGISQSRLOLL.NPTPGVWFPITPGFLALPSSKRERSFSLQOKDKECLLPMES 
MASLSGPIS...PTNLEMFKPGVEELNPSKLLLLSNHQEGMLYPTILGSLELLSFKRERSLNLQRDKVCLLHOQES 


The alignment was performed using MultAlin 
(Corpet, 1988) and due to the very low level of 
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A Hydrophobic Tunnel 

The 8 sheets of ORF-9b form a tent-like structure which 
contains a 22A long central cavity, lined by hydrophobic 
side chains, which spans the molecule and is open at 
both ends (Figures 2E, 3A, and 3B). Within this tunnel, 
the experimentally phased map shows a finger of elec- 
tron density (Figure 3A) straddling the molecular dyad, 
separate from the protein portion of the map. Given 
the hydrophobic nature of the cavity, we interpreted 
this feature as an aliphatic molecule bound within the 
ORF-9b dimer. For the purpose of crystallographic 
model building and refinement, we have represented 
this ligand as a hydrocarbon molecule (the resolved 
electron density is consistent with an unbranched chain 
of ten carbon atoms; see Figures 3A and 3B). 

In order to identify the molecule bound in the tunnel, 
we carried out a mass spectrometry analysis (shown in 
Figure 3C) which revealed a single molecular species 
of molecular weight 414 Da, significantly larger than ex- 
pected from the visible electron density. The mass spec- 
trometry data suggest that the ligand is a fatty acid 
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sequence similarity should be regarded as 
provisional, pending a proper structure- 
based analysis. Residue numbering refers to 
the ORF-9b protein from SARS-CoV. Protein 
secondary structure is represented as yellow 
arrows (6 strands), solid lines (random coil 
conformation), and dotted lines (disordered 
regions). Residues which line the hydropho- 
bic tunnel of ORF-9b are highlighted in green. 
Magenta bars indicate residues which vary 
between different SARS isolates (see Supple- 
mental Data for a detailed alignment). The fig- 
ure was produced with ESPript (Gouet et al., 
1999). 
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(or fatty acid ester) containing approximately 25 carbon 
atoms. This is entirely consistent with the open architec- 
ture of the hydrophobic pocket of ORF-9b, which would 
allow such large hydrophobic molecules to be accom- 
modated, with atoms hanging from the end(s) of the tun- 
nel (invisible in our crystallographic analysis due to flex- 
ibility). The mass analysis is incompatible with the ligand 
being derived from the detergent present during cell 
lysis. This suggests that the ligand was picked up from 
the bacterial expression host and is tightly bound so 
that it remains associated with the ORF-9b protein dur- 
ing purification. In order to establish whether this was an 
artifact of the expression system used, refolding exper- 
iments (see Experimental Procedures) were performed, 
which demonstrated that—in the absence of lipid—the 
protein folds into molecules with hydrodynamic proper- 
ties identical to those of the lipid-bound structure. Fur- 
thermore, thermostability assays (see Figure 3D) show 
that refolded, lipid-free ORF-9b is slightly less stable 
than that purified from E. coli, consistent with lipid bind- 
ing representing a thermodynamic phenomenon. 
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Figure 3. Lipid Binding of ORF-9b 
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(A) The protein structure is cut in the center and viewed along its longest axis to reveal the central hydrophobic tunnel. The crystallographically 
resolved portion of the ligand is shown in yellow; the experimentally phased electron density (contoured at 1c) is shown in green. The 2-fold 
symmetry axis lies in the plane of the paper, as indicated. 

(B) Cartoon of the residues lining the lipid binding pocket. The lipid is shown in ball-and-stick representation and the residues contributed by the 
two subunits are distinguished by the color of their contact symbols. 

(C) Mass spectrum of the extracted ligand molecule (designated M). Its molecular weight is 414 Da. In the mass spectrum, three significant peaks 
are observed, suggestive of proton (+1 Da), sodium (+23 Da), and potassium (+39 Da) adducts of a fatty acid molecule. The observed mass is 
consistent with a saturated long-chain hydroxy fatty acid C2;H590, (mass 413 Da), such as dihydroxy-pentacosanoate, or a nonhydroxy fatty 
acid ester Co5Hs590,4 (mass 413 Da) (Christie, 1982). 

(D) ThermoFluor assay result for natively purified (lipid-containing) and refolded (lipid-free) ORF-9b. In both cases, there is a strong fluorescence 
peak at 65°C-70°C, which presumably represents the collapse of the dimeric ORF-9b structure. Refolded ORF-9b peaks at 66°C, whereas 
natively purified ORF-9b has a peak at 69°C, demonstrating that lipid-containing ORF-9b is slightly more stable than ORF-9b without lipid, sug- 
gesting that the lipid has a thermodynamically stabilizing effect on the protein. 

(E) Proposed mode of ORF-9b membrane interaction (protein structure represented as in Figure 2B, but rotated by 90° around the vertical 2-fold 
symmetry axis). The positively charged side of the protein (Shown in blue) could interact with the negatively charged lipid head groups, while the 


hydrophobic pocket (indicated by dashed lines) binds one or more lipid tails. 


Given the width of the tunnel, it is likely that ORF-9b 
can accommodate bulkier or irregularly shaped mole- 
cules (for instance, unsaturated lipid chains). In fact, in 
the crystal structure presented here, the bound ligand 
occupies under half of the volume of the pore. This is 
considerably less than for other lipid binding proteins, 
which have more tight-fitting hydrophobic pockets 
(see Table 2). 


An Unusual Type of Membrane Binding Protein 
Expression in mammalian cells (see Experimental Pro- 
cedures) reveals that ORF-9b is membrane bound and 
appears to be associated with intracellular vesicular 
structures (Figures 4A and 4B). Given that expression 
of the protein in E. coli gives soluble protein, this sug- 
gests that ORF-9b specifically recognizes and binds to 
intracellular membranes in eukaryotes. 

Considering the polarized surface and hydrophobic 
tunnel of the protein, it is most likely that the ORF-9b 


molecule immerses itself in the lipid head groups of 
the membrane and becomes anchored by internalizing 
one or more lipidic tails (illustrated in Figure 3E). This 
would be the opposite of the more usual covalent at- 
tachment of lipid tails to proteins, and would represent 
an unusual mode of membrane interaction. Indeed, we 
are aware of only one similar example in the litera- 
ture—that of the conserved eukaryotic protein bets, 
which appears to interact with the Golgi by binding 
myristate chains of the phospholipid bilayer in a well- 
defined pocket (Kim et al., 2005). 


Function of the Protein in the Life Cycle 

of the SARS Virus 

At present, there is little published work on ORF-9b, so 
the precise function of the protein remains uncertain. 
Most of our current knowledge comes from homologous 
gene products in other coronaviruses, most importantly 
the work on MHV and BCV discussed above, where the 
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Table 2. A Comparison of the Hydrophobic Pockets of Different Types of Lipid Binding Proteins 


PDB Code Volume of Volume of the Ligand Percentage of 
Protein Structure (Literature Reference) Type of Ligand the Pore (A°) _ in the Pore (A°) Pore Occupied 
SARS-CoV ORF-96 2CME Long-chain fatty acid 365 166 45% 
Human CD1b 1GZP (Gadola et al., 2002) Ganglioside GM2 (glycolipid) 1564 881 56% 
Poliovirus 2PLV (Filman et al., 1989) | Sphingosine (long-chain 292 176 60% 


amino alcohol) 


protein has been shown to be present as a structural 
component in the virus particle (Fischer et al., 1997) 
and is likely to be an accessory protein in virus assem- 
bly. Because the protein occurs as an alternative open 
reading frame of the viral nucleocapsid gene, it has 
been suggested to serve as an interaction partner of 
the corresponding N-protein (Senanayake and Brian, 
1997). 

Combining these observations with our structural and 
functional studies, it seems plausible that ORF-9b has 
a role in membrane interactions during the assembly 
of the virus (although other possible functions cannot 
be ruled out). It could act as an attachment point onto 
the membrane for other proteins, such as the N-protein, 
which is known to bind to membranes via protein- 
protein interactions, even though the details are not 
well understood at present (de Haan and Rottier, 
2005). Alternatively, ORF-9b could act as a modulator 
of membranes themselves, for example by promoting 
vesicle formation and budding (both of which are key 
stages in the assembly of coronaviruses). In this regard 
the higher order assembly of ORF-9b in the crystal, as 
interconnected twisted ropes (see Figure 2D), may be 
partially recapitulated at the membrane surface, impart- 
ing some torque on the membrane. The possible role of 
ORF-9b in the assembly of the SARS virus is illustrated 
in Figure 4C. 
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The Evolution of ORF-9b and Homologs in Other 
Coronaviruses 

The sequence of ORF-9b is well conserved in different 
SARS isolates (See Supplemental Data). As discussed 
above, several other, mainly group 2, coronaviruses 
(e.g., MHV and BCV) also have an alternative ORF in their 
nucleocapsid gene. There is good sequence conser- 
vation of the alternative ORF in these group 2 coronavi- 
ruses, but rather little similarity with SARS-CoV ORF-9b 
(see Figure 2E). This intermediate relationship reflects 
the evolution of SARS-CoV, which has been suggested 
to be an early split-off from the group 2 coronavirus sub- 
family (see Figure 5A) (Snijder et al., 2003). Functionally, 
the ORF-9b homolog of MHV acts as an accessory 
structural protein that is not essential for viral infection 
but confers a small growth advantage (Fischer et al., 
1997), presumably explaining why non-group 2 corona- 
viruses can function without the protein. Taken together, 
these observations argue that the internal ORF of the 
N-gene of group 2 coronaviruses evolved after the 
N-protein (which, in contrast, is present and reasonably 
conserved in all coronaviruses; Stadler et al., 2003). The 
most likely mechanism by which such an alternative 
ORF could arise would be as an “accidental” mistrans- 
lation of the N-gene that subsequently evolved into 
a structured and functional protein (Figure 5B). This 
hypothesis is consistent with the crystal structure 


Figure 4. Expression and Intracellular Loca- 
tion of ORF-9b in 293T Mammalian Cells 


(A) Differential interference contrast micros- 
copy (gray) overlapped with epifluorescence 
using an anti-His fluorochrome-conjugated 
monoclonal antibody (red). The protein is asso- 
ciated with intracellular vesicular structures. 
(B) Partitioning experiment, in which the 293T 
cells expressing ORF-9b were fractionated 
into soluble (cytosolic) and insoluble/noncy- 
tosolic (nuclear, intravesicular, membrane- 
associated) phases. Western blotting with 
a specific antibody shows that ORF-9b is in 
the insoluble fraction, most likely membrane 
attached as suggested by (A). M, molecular 
weight markers; I, insoluble phase; S, soluble 
phase. 

(C) Cartoon representation of the intracellular 
localization of ORF-9b (red dots) and its pos- 
sible function in the assembly of the SARS 
coronavirus. In this model, ORF-9b is located 
on the cytosolic side of intracellular vesicles, 
serving as an attachment point for compo- 
nents of the nascent virus and/or as a modula- 
tor of membranes. Currently it is unknown 
whether ORF-9b is present in the assembled 
virus particle; however, the homologous pro- 
tein is known to be present in particles of 
MHV (Fischer et al., 1997). 
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Figure 5. The Structural Evolution of Alternative Open Reading Frames in Viruses 


(A) The evolution of coronaviruses (adapted from Snijder et al., 2003). Based on their genome sequence, coronaviruses fall into three main 
groups. SARS-CoV is thought to be an early split-off from the group 2 lineage (indicated by a dashed circle). Alternative open reading frames 
of the nucleocapsid gene are found only in group 2 viruses, such as MHV and BCV (Senanayake and Brian, 1997), as well as group 2-related 
viruses, including SARS-CoV. BCV, bovine coronavirus; HCoV-229E, human coronavirus 229E; IBV, infectious bronchitis virus; MHV, murine 
hepatitis virus; PDEV, porcine epidemic diarrhea virus; SARS-CoV, severe acute respiratory syndrome coronavirus; TGEV, transmissible gastro- 
enteritis virus. 

(B) A model for the structural evolution of ORF-9b within the SARS-CoV N-gene. Starting from an N-gene without an alternative ORF, the protein 
first arises as an “accidental” translation product, which is mostly unstructured. By gradual constrained evolution, it becomes increasingly struc- 
tured, eventually attaining its present fold. In this scheme, disordered regions (colored in red) are a relict of the evolutionary trajectory of the pro- 
tein. The N-protein is represented as a composite of two NMR structures of its well-conserved N- and C-terminal domains (Chang et al., 2005a; 
Huang et al., 2004), which are thought to be surrounded by flexible linkers (Chang et al., 2005b) (colored in red). The region of the N-terminal 
domain which overlaps with ORF-9b is shown in green. 

(C) For comparison, an illustration of the HIV1 vpu and env genes, which partially overlap. The NMR structure of the overlapping portion of VPU 
(Willbold et al., 1997) (PDB code: 1VPU) is shown. This protein is relatively poorly ordered (rmsd = 1.6 A between multiple determinations of the 
fold, for all Ca atoms). The least ordered regions (rmsd > 2 A) are highlighted in red. 


presented here, as, to our knowledge, there is no evi- 
dence that it resembles any other known protein fold. 


Implications for the Structural Evolution of 
Alternative Open Reading Frames 

Alternative open reading frames are coupled to their cor- 
responding “conventional” reading frame on a genetic 
level (in the case of ORF-9b, the conventional ORF is 
the SARS-CoV N-protein). Changes in the DNA se- 
quence will therefore affect both the conventional and 
alternative ORF, limiting the rate and extent to which 
the corresponding proteins can evolve. The result will 
be a “constrained evolution,” an idea which has been 
demonstrated for regions of viral genes which (partially) 


overlap (Mizokami et al., 1997). Because ORF-9b is en- 
tirely contained within its corresponding conventional 
gene, the constraint applies to the entire protein. The 
properties of coupled gene products are likely to be 
suboptimal compared to proteins that can evolve inde- 
pendently. In the early stages of the evolution of the 
alternative reading frame, this protein might be ex- 
pected to be rather poorly folded, whereas eventually 
there would presumably be a balance, struck by selec- 
tion, with both protein products being somewhat com- 
promised. Thus, the structural disorder observed in 
ORF-9b (evinced in the considerable divergence of the 
eight noncrystallographically related molecules; see 
Table 1) may reflect its relatively recent invention and 
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the hindrance to its rapid evolution imposed by the 
coupling to the essential N-gene. 

Support for this general hypothesis comes from a 
system which shares some similarity with the case pre- 
sented here: in lentiviruses, the vou and env genes— 
although not strictly alternative ORFs—share a consider- 
able overlapping region (Figure 5C). Similar to ORF-9b, 
vou encodes a small accessory protein which is present 
only in a subfamily of lentiviruses (namely, human immu- 
nodeficiency virus type 1, HIV-1). Although various mo- 
lecular functions have been described for the protein, 
its precise role in HIV-1 pathogenesis remains unclear 
(Hout et al., 2004). In the vou gene, the region which 
overlaps with the neighboring env gene encodes its 
cytoplasmic domain, whose structure has been deter- 
mined by NMR (Willbold et al., 1997). This small domain 
shows two well-defined helical regions and, like ORF- 
9b, a considerable amount of poorly defined structure 
(Figure 5C). 


Conclusions 

The structure of ORF-9b, an intertwined dimer with 
an amphipathic outer surface and a long hydrophobic 
lipid binding tunnel, suggests how this protein may 
interact, via an unusual anchoring mechanism, with 
compartments of the ER-Golgi network to act as an ac- 
cessory protein during the assembly of the SARS virion. 
These unusual structural and functional properties are 
probably a reflection of the distinct evolutionary trajec- 
tory of alternative open reading frames. Indeed, we pro- 
pose that the constraints under which alternative open 
reading frames evolve give rise to characteristic struc- 
tural features, most importantly the potential for novel 
folds and structural disorder. While further studies and 
structures will be required to test this hypothesis, alter- 
native open reading frames provide powerful model sys- 
tems to test ideas about protein folding and evolution. 


Experimental Procedures 


Cloning and Production of Selenomethionine-Labeled Protein 
for Structural Studies 

The coding sequence of SARS-CoV ORF-9b was amplified by PCR 
(forward primer: 5'-GGGGACAAGTTTGTACAAAAAAGCAGGCTTC 
GAAGGAGATAGAACCATGCATCACCATCACCATCACATGGACCC 
CAATCAAACCAACG-3’; reverse primer 5’'-GGGGACCACTTTGT 
ACAAGAAAGCTGGGTCTCATTTTGCCGTCACCACCACG-3’). The 
forward primer encodes a start codon and hexahistidine tag N-ter- 
minal to the gene and both forward and reverse primers contain 
the attB site of the Gateway cloning system (Invitrogen). The PCR 
fragments were subcloned into the pDEST14 plasmid (Invitrogen). 
For expression of the selenomethionine-labeled protein, the 
pDEST14 plasmid was transformed into E. coli strain Rosetta pLysS 
(Novagen). Cultures were grown in selenomethionine media (Molec- 
ular Dimensions) spiked with 1% glucose as well as lysine, threo- 
nine, and phenylalanine (100 g/ml each) and leucine, isoleucine, 
valine, and selenomethionine (50 t.g/ml each). Cells were grown at 
310 K until an ODs95 pm Of 0.6 was reached, and then cooled to 
293 K for 30 min. Expression was induced by the addition of iso- 
propylthiogalactoside to a final concentration of 0.5 mM, and the 
cultures were grown for a further 20 hr at 293 K. The cells were har- 
vested by centrifugation at 12,000 x g for 30 min and the bacterial 
pellets were resuspended in 50 mM Tris-HCl (pH 8.0), 500 mM 
NaCl, 1% Tween-20, 10 mM imidazole. The cells were lysed using 
a cell disruptor (Constant Systems) and the sample was clarified 
by centrifugation at 30,000 x g for 30 min. To the supernatant, 
2 ml of Ni-NTA superflow (Qiagen) was added and the sample was 
stirred for 2 hr at 4°C. The Ni-NTA beads were subsequently washed 


with 50 mM Tris-HCl (pH 8.0), 500 mM NaCl, 20 mM imidazole and 
protein was eluted in 50 mM Tris-HCl (pH 8.0), 500 mM NaCl, 
500 mM imidazole. The eluted material was further purified by gel fil- 
tration in 50 mM Tris-HCl (pH 8.0), 200 mM NaCl, 1 mM dithiothreitol 
using a HiLoad 16/60 Superdex 200 column (Amersham Biosci- 
ences). Full selenomethionine incorporation of the purified protein 
was confirmed by mass spectrometry (data not shown). 


Crystallization and Data Collection 

Crystallization was carried out using a Cartesian robotic dispensing 
system (Genomic Solutions) (Walter et al., 2005) and crystal growth 
was monitored using the OPPF storage and imaging system (Mayo 
et al., 2005). Diffraction quality crystals grew within 1-2 days in 
300 nl sitting drops containing 6.3 mg/ml purified ORF-9b protein, 
11% PEG-3350, 66 mM MgCls, 33 mM Tris-HCl (pH 8.2), 20 mM 
NAD equilibrated against a reservoir solution of 32% PEG3350, 
200 mM MgClo, 100 mM Tris-HCl (pH 8.2). 

Crystals were cryoprotected in perfluoropolyether oil PFO-X125/ 
03 (Lancaster) and flash-frozen in liquid nitrogen (100 K). Single- 
wavelength anomalous diffraction data were collected at the ESRF 
beamline BM14 (Grenoble, France) and processed using the pro- 
grams DENZO and SCALEPACK (Otwinowski and Minor, 1997). 
The data were weak, especially at higher resolution, as reflected in 
the poor R factors (Table 1); however, the high redundancy means 
that the merged data were reasonably reliable. 


Model Building and Refinement 

The selenium substructure was solved using the programs 
HKL2MAP and SHELXD (McRee, 1999; Pape and Schneider, 2004; 
Schneider and Sheldrick, 2002; Sheldrick, 2002). Phase refinement 
was carried out using SHARP (de La Fortelle and Bricogne, 1997), 
maps were calculated using CCP4 programs (CCP4, 1994), sol- 
vent-flattened using Pirate (Cowtan, 2000), and averaged using 
GAP (J.M.G. and D.I.S., unpublished program). The presence of par- 
tially disordered regions, together with the complex topology of the 
protein, made the interpretation of the crystallographic data chal- 
lenging. Model building and refinement used the programs Coot 
(Emsley and Cowtan, 2004) and CNS (Brtinger et al., 1998). Initial re- 
finement imposed strict 8-fold noncrystallographic symmetry (NCS) 
constraints. In further rounds of refinement, 8-fold NCS restraints 
were used. Data processing, refinement, and model building statis- 
tics are shown in Table 1. 


Calculation of the Hydrophobic Pore Dimensions 

and Ligand Occupancy 

The dimensions of the pockets of the structures of ORF-9b, polio- 
virus (Filman et al., 1989), and human CD1b (Gadola et al., 2002) 
were determined as the volume accessible to solvent probe with a di- 
ameter of ~1.4 A. For the calculation of the ligand volume, only 
those ligand atoms located within the pore were included (atom 
diameter, ~1.4 A). All calculations were carried out using the pro- 
gram VOLUMES (R.M. Esnouf, unpublished computer program, 
personal communication). 


Characterization of the Lipid Ligand by Mass Spectrometry 
Concentrated purified ORF-9b protein was added to a chloroform- 
water two-phase mixture and vortexed. After a 5 min equilibration 
at room temperature, the hydrophilic phase was removed. The hy- 
drophobic phase containing the ligand molecule was diluted into 
a mixture of chloroform, methanol, and water (10:10:3) and analyzed 
by positive ion electrospray mass spectrometry on a Waters-Micro 
Q-TOF mass spectrometer. 


Purification under Denaturing Conditions, Refolding, 

and Biophysical Experiments 

ORF-9b protein was expressed in E. coli as described in Experimen- 
tal Procedures. The protein was then purified under denaturing con- 
ditions as follows: cells were harvested by centrifugation at 
12,000 x g for 30 min and the bacterial pellet was resuspended in 
6 M guanidine hydrochloride, 50 mM Tris-HCl (pH 8.0), 300 mM 
NaCl, 1% Tween-20, 10 mM imidazole. The cells were lysed by son- 
ication and the sample was clarified by centrifugation at 30,000 x g 
for 30 min. To the supernatant, 2 ml of Ni-NTA superflow (Qiagen) 
was added and the sample was stirred for 2 hr at 4°C. The Ni-NTA 
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beads were then washed with 6 M guanidine hydrochloride, 50 mM 
Tris-HCl (pH 8.0), 300 mM NaCl, 20 mM imidazole and protein was 
eluted in 6 M guanidine hydrochloride, 50 mM Tris-HCl (pH 8.0), 
300 mM NaCl, 500 mM imidazole. 

The protein was refolded by rapid dilution into 200 mM Tris-HCl 
(pH 8.0), 1 M L-arginine, 0.1 mM phenylmethylsulfonylfluoride and 
incubated for 24 hr at 4°C. The refolding mixture was then concen- 
trated and purified by gel filtration in 50 mM Tris-HCl (pH 8.0), 200 
mM NaCl using a HiLoad 16/60 Superdex 200 column (Amersham 
Biosciences). The elution position of refolded ORF-9b was identical 
to that of natively purified protein. 

Mass spectrometry analysis confirmed that refolded ORF-9b con- 
tained only trace amounts of the lipid ligand (data not shown). Dy- 
namic light scattering revealed that both refolded and natively puri- 
fied protein have similar hydrodynamic behavior (Rlpatively purified = 
2.27; Rhyetoidead = 2-23) and analytical ultracentrifugation of refolded 
ORF-9b gave a profile similar to that shown in Supplemental Data. 

To test the effect of the lipid removal on the stability of the protein, 
we carried out a ThermoFluor assay (Lo et al., 2004; Pantoliano et al., 
2001). In this experiment, ORF-9b was heated from 20°C to 95°C in 
the presence of a fluorescent dye, SYPRO Orange (Molecular 
Probes). As the protein unfolds, hydrophobic residues become sol- 
vent exposed, leading to an increase in the fluorescence of the dye, 
measured in a real-time PCR machine (Bio-Rad Opticon 2). 


ORF-9b Expression in Mammalian Cells 
N- and C-terminally hexahistidine-tagged ORF-9b constructs were 
prepared by PCR and subcloned into the mammalian expression 
vector pLEXm (Aricescu et al., 2006) for transient expression tests. 
For the subcellular localization analysis, the ORF-9b constructs 
were transfected into COS7 cells grown on four-well BD Falcon 
glass culture slides (BD Biosciences) using Lipofectamine (Invitro- 
gen). The cells were processed for immunofluorescence 48 hr later. 
Briefly, the procedure was as follows: cells were fixed in 4% parafor- 
maldehyde, permeabilized with 0.2% Triton X-100, and blocked with 
a mixture of 1% bovine serum albumin (BSA) and 0.25% Triton X-100 
(all solutions were made in PBS [pH 7.4], and all incubations were 
done at room temperature). The His-tagged ORF-9b was detected 
using the penta-His Alexa Fluor 555 monoclonal antibody (Qiagen) 
diluted 1:150 in PBS containing 3% BSA and 0.05% Triton X-100. 
The slides were washed, mounted in FluorSave reagent (Calbio- 
chem), and imaged using a Nikon Eclipse TE2000U inverted micro- 
scope. Differential interference contrast and epifluorescence im- 
ages were taken using a 60x oil objective with a Hamamatsu Orca 
285 CCD camera and processed with IP Lab imaging software. 
ORF-9b expression was also analyzed by Western blotting. 
HEK293T cells were transiently transfected with the ORF-9b His- 
tagged constructs, collected 48 hr later, and lysed by repeated pas- 
sage through a 22G needle and centrifuged for 10 min at 3000 g. 
Samples were loaded in a 15% acrylamide gel, separated by SDS- 
PAGE, and electroblotted onto a Hybond-C nitrocellulose mem- 
brane (Amersham Pharmacia Biotechnology). The membrane was 
blocked overnight in 5% skimmed dry milk in PBS and probed for 
1 hr at room temperature with the penta-His monoclonal antibody 
(Qiagen; 1:1000 dilution). The secondary antibody used was goat 
anti-mouse IgG (Fc-specific) horseradish peroxidase (Sigma; 
1:2000 dilution). Chemiluminescence detection was performed 
using the ECL kit (Amersham Pharmacia Biotechnology). 


Figures 

Figures were generated with Bobscript (Esnouf, 1997) and PyMOL 
(DeLano, 2002) and rendered with POV-Ray (Persistence of Vision 
Ltd., Williamstown, Victoria, Australia). 


Supplemental Data 

Supplemental Data include analytical ultracentrifugation results and 
sequence alignments and are available with this article online at 
http://www.structure.org/cgi/content/full/1 4/7/1157/DC1/. 
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